[SPARK-12602] [SQL] Join Reordering: Pushing Inner Join Through Left/Right Outer Join#10551
Closed
gatorsmile wants to merge 6 commits intoapache:masterfrom
Closed
[SPARK-12602] [SQL] Join Reordering: Pushing Inner Join Through Left/Right Outer Join#10551gatorsmile wants to merge 6 commits intoapache:masterfrom
gatorsmile wants to merge 6 commits intoapache:masterfrom
Conversation
|
Test build #48575 has finished for PR 10551 at commit
|
Member
Author
|
retest this please |
|
Test build #48576 has finished for PR 10551 at commit
|
|
Test build #48577 has finished for PR 10551 at commit
|
|
Test build #48578 has finished for PR 10551 at commit
|
|
Test build #48588 has finished for PR 10551 at commit
|
|
Test build #48593 has finished for PR 10551 at commit
|
Contributor
|
Mind closing this one as well? |
Member
Author
|
Let me close it. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is to push
Inner JointhroughLeft/Right Outer Join.The basic idea is built on the associativity property of outer and inner joins:
R1 inner (R2 left R3 on p23) on p12 = (R1 inner R2 on p12) left R3 on p23R1 inner (R2 right R3 on p23) on p13 = R2 right (R1 inner R3 on p13) on p23 = (R1 inner R3 on p13) left R2 on p23(R1 left R2 on p12) inner R3 on p13 = (R1 inner R3 on p13) left R2 on p12(R1 right R2 on p12) inner R3 on p23 = R1 right (R2 inner R3 on p23) on p12 = (R2 inner R3 on p23) left R1 on p12In this PR, the reordering can reduce the number of processed rows since the
Inner Joinalways can generate less (or equivalent) rows thanLeft/Right Outer Join. The join predicates ofLeft/Right Outer Joinwill not affect the number of returned rows. This PR can improve the query performance in most cases, especially when the join predicates ofInner Joinare highly selective.When cost-based optimization is available, we can switch the order of tables in each join type based on their costs. The order of joined tables in the inner join does not affect the results and the right outer join can be changed to the left outer join. This part is out of scope here.
For example, given the following eligible query:
df.join(df2, $"a.int" === $"b.int", "right").join(df3, $"c.int" === $"b.int", "inner")Before the fix, the logical plan is like
After the fix, the logical plan is like